This RMarkdown file contains the report of the data analysis done for the project on forecasting daily bike rental demand using time series models in R. It contains analysis such as data exploration, summary statistics and building the time series models. The final report was completed on Thu Feb 26 18:04:17 2026.
Data Description:
This dataset contains the daily count of rental bike transactions between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information.
Data Source: https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset
Relevant Paper:
Fanaee-T, Hadi, and Gama, Joao. Event labeling combining ensemble detectors and background knowledge, Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg
## Import required packages
if (!require("pacman")) install.packages("pacman")
## Loading required package: pacman
pacman::p_load(tidyverse, timetk, lubridate, forecast, tseries, ggthemes)
install.packages('psych')
## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
data("bike_sharing_daily")
bike_data <-bike_sharing_daily
view(bike_data)
describe(bike_data)
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
## vars n mean sd median trimmed mad min max
## instant 1 731 366.00 211.17 366.00 366.00 271.32 1.00 731.00
## dteday 2 731 NaN NA NA NaN NA Inf -Inf
## season 3 731 2.50 1.11 3.00 2.50 1.48 1.00 4.00
## yr 4 731 0.50 0.50 1.00 0.50 0.00 0.00 1.00
## mnth 5 731 6.52 3.45 7.00 6.52 4.45 1.00 12.00
## holiday 6 731 0.03 0.17 0.00 0.00 0.00 0.00 1.00
## weekday 7 731 3.00 2.00 3.00 3.00 2.97 0.00 6.00
## workingday 8 731 0.68 0.47 1.00 0.73 0.00 0.00 1.00
## weathersit 9 731 1.40 0.54 1.00 1.33 0.00 1.00 3.00
## temp 10 731 0.50 0.18 0.50 0.50 0.23 0.06 0.86
## atemp 11 731 0.47 0.16 0.49 0.48 0.20 0.08 0.84
## hum 12 731 0.63 0.14 0.63 0.63 0.16 0.00 0.97
## windspeed 13 731 0.19 0.08 0.18 0.19 0.07 0.02 0.51
## casual 14 731 848.18 686.62 713.00 744.95 587.11 2.00 3410.00
## registered 15 731 3656.17 1560.26 3662.00 3641.72 1712.40 20.00 6946.00
## cnt 16 731 4504.35 1937.21 4548.00 4517.19 2086.02 22.00 8714.00
## range skew kurtosis se
## instant 730.00 0.00 -1.20 7.81
## dteday -Inf NA NA NA
## season 3.00 0.00 -1.35 0.04
## yr 1.00 0.00 -2.00 0.02
## mnth 11.00 -0.01 -1.21 0.13
## holiday 1.00 5.63 29.75 0.01
## weekday 6.00 0.00 -1.26 0.07
## workingday 1.00 -0.79 -1.38 0.02
## weathersit 2.00 0.95 -0.15 0.02
## temp 0.80 -0.05 -1.12 0.01
## atemp 0.76 -0.13 -0.99 0.01
## hum 0.97 -0.07 -0.08 0.01
## windspeed 0.49 0.67 0.39 0.00
## casual 3408.00 1.26 1.29 25.40
## registered 6926.00 0.04 -0.72 57.71
## cnt 8692.00 -0.05 -0.82 71.65
summary(bike_data)
## instant dteday season yr
## Min. : 1.0 Min. :2011-01-01 Min. :1.000 Min. :0.0000
## 1st Qu.:183.5 1st Qu.:2011-07-02 1st Qu.:2.000 1st Qu.:0.0000
## Median :366.0 Median :2012-01-01 Median :3.000 Median :1.0000
## Mean :366.0 Mean :2012-01-01 Mean :2.497 Mean :0.5007
## 3rd Qu.:548.5 3rd Qu.:2012-07-01 3rd Qu.:3.000 3rd Qu.:1.0000
## Max. :731.0 Max. :2012-12-31 Max. :4.000 Max. :1.0000
## mnth holiday weekday workingday
## Min. : 1.00 Min. :0.00000 Min. :0.000 Min. :0.000
## 1st Qu.: 4.00 1st Qu.:0.00000 1st Qu.:1.000 1st Qu.:0.000
## Median : 7.00 Median :0.00000 Median :3.000 Median :1.000
## Mean : 6.52 Mean :0.02873 Mean :2.997 Mean :0.684
## 3rd Qu.:10.00 3rd Qu.:0.00000 3rd Qu.:5.000 3rd Qu.:1.000
## Max. :12.00 Max. :1.00000 Max. :6.000 Max. :1.000
## weathersit temp atemp hum
## Min. :1.000 Min. :0.05913 Min. :0.07907 Min. :0.0000
## 1st Qu.:1.000 1st Qu.:0.33708 1st Qu.:0.33784 1st Qu.:0.5200
## Median :1.000 Median :0.49833 Median :0.48673 Median :0.6267
## Mean :1.395 Mean :0.49538 Mean :0.47435 Mean :0.6279
## 3rd Qu.:2.000 3rd Qu.:0.65542 3rd Qu.:0.60860 3rd Qu.:0.7302
## Max. :3.000 Max. :0.86167 Max. :0.84090 Max. :0.9725
## windspeed casual registered cnt
## Min. :0.02239 Min. : 2.0 Min. : 20 Min. : 22
## 1st Qu.:0.13495 1st Qu.: 315.5 1st Qu.:2497 1st Qu.:3152
## Median :0.18097 Median : 713.0 Median :3662 Median :4548
## Mean :0.19049 Mean : 848.2 Mean :3656 Mean :4504
## 3rd Qu.:0.23321 3rd Qu.:1096.0 3rd Qu.:4776 3rd Qu.:5956
## Max. :0.50746 Max. :3410.0 Max. :6946 Max. :8714
#convert date column to date type
bike_data$dteday <- as.Date(bike_data$dteday)
boxplot(bike_data)
ggplot(bike_data, aes(x = dteday, y = cnt)) +
geom_line() +
labs(title = "Daily Bike Rentals", x = "Date", y = "Count")
#mean and median temps over seasons
bike_data %>%
group_by(season) %>%
summarize(mean_temp = mean(temp), median_temp = median(temp))
## # A tibble: 4 × 3
## season mean_temp median_temp
## <dbl> <dbl> <dbl>
## 1 1 0.298 0.286
## 2 2 0.544 0.562
## 3 3 0.706 0.715
## 4 4 0.423 0.409
# Mean temperature, humidity, wind speed, and total rentals per month
bike_data %>%
group_by(mnth) %>%
summarize(mean_temp = mean(temp),
mean_humidity = mean(hum),
mean_windspeed = mean(windspeed),
total_rentals = sum(cnt))
## # A tibble: 12 × 5
## mnth mean_temp mean_humidity mean_windspeed total_rentals
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0.236 0.586 0.206 134933
## 2 2 0.299 0.567 0.216 151352
## 3 3 0.391 0.588 0.223 228920
## 4 4 0.470 0.588 0.234 269094
## 5 5 0.595 0.689 0.183 331686
## 6 6 0.684 0.576 0.185 346342
## 7 7 0.755 0.598 0.166 344948
## 8 8 0.709 0.638 0.173 351194
## 9 9 0.616 0.715 0.166 345991
## 10 10 0.485 0.694 0.175 322352
## 11 11 0.369 0.625 0.184 254831
## 12 12 0.324 0.666 0.177 211036
# Temperature association with bike registered/casual rentals
ggplot(bike_data, aes(x = temp)) +
geom_point(aes(y = registered, color = "Registered")) +
geom_point(aes(y = casual, color = "Casual")) +
labs(title = "Temperature vs. Bike Rentals", x = "Normalized Temperature", y = "Count") +
scale_color_manual(values = c("Registered" = "blue", "Casual" = "red"))
## Read about the timetk package
# ?timetk
bike_data %>%
plot_time_series(.date_var = dteday, .value = cnt, .interactive = TRUE, .plotly_slider = TRUE, .color_var = year(dteday))
bike_data %>%
plot_time_series(.date_var = dteday, .value = cnt, .smooth = TRUE)
# Load additional required packages
pacman::p_load(zoo, TTR)
# Clean the time series data
bike_data_clean <- bike_data %>%
mutate(cnt_clean = tsclean(ts(cnt, frequency = 365)))
# Plot cleaned data
ggplot(bike_data_clean, aes(x = dteday)) +
geom_line(aes(y = cnt, color = "Original")) +
geom_line(aes(y = cnt_clean, color = "Cleaned")) +
labs(title = "Cleaned Daily Bike Rentals", x = "Date", y = "Count") +
scale_color_manual(values = c("Original" = "black", "Cleaned" = "red"))
# Apply Simple Moving Average (SMA)
bike_data_clean <- bike_data_clean %>%
mutate(cnt_sma = SMA(cnt_clean, n = 10))
# Plot smoothed data
ggplot(bike_data_clean, aes(x = dteday)) +
geom_line(aes(y = cnt_clean, color = "Cleaned")) +
geom_line(aes(y = cnt_sma, color = "Smoothed (SMA)")) +
labs(title = "Smoothed Daily Bike Rentals", x = "Date", y = "Count") +
scale_color_manual(values = c("Cleaned" = "blue", "Smoothed (SMA)" = "red"))
## Don't know how to automatically pick scale for object of type <ts>. Defaulting
## to continuous.
## Warning: Removed 9 rows containing missing values (`geom_line()`).
# Apply Simple Exponential Smoothing
bike_ts<- ts(bike_data_clean$cnt_clean, frequency = 365)
fit_ets <- HoltWinters(bike_ts)
# Plot Exponential Smoothing
plot(fit_ets)
bike_decomp <- stl(bike_ts, s.window = "periodic")
plot(bike_decomp)
#fit ARIMA model
fit <- auto.arima(bike_ts, seasonal = TRUE)
summary(fit)
## Series: bike_ts
## ARIMA(1,0,3)(0,1,0)[365] with drift
##
## Coefficients:
## ar1 ma1 ma2 ma3 drift
## 0.9683 -0.5912 -0.1279 -0.0937 5.7116
## s.e. 0.0224 0.0571 0.0617 0.0576 0.8318
##
## sigma^2 = 986021: log likelihood = -3042.81
## AIC=6097.63 AICc=6097.86 BIC=6121.05
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 5.853004 697.8113 385.8648 -2.699883 9.189324 0.1694626
## ACF1
## Training set -0.003587809
#forecast data
bike_forecast <- forecast(fit,h=60)
#plot
autoplot(bike_forecast) +
labs(title = "Bike Rental Forecast for Next 60 Days", x = "Date", y = "Count")
Seasonal Patterns: Bike rentals exhibit clear seasonal patterns, with a higher number of rentals during warmer months and lower number of rentals in colder months. Suggesting that weather plays a significant role in bike rental.
Model Accuracy: The ARIMA model captured the overall trend and seasonality well, showing that it may be a useful tool for predicting future bike rental demand.
Next steps would be to look further into the correlation of different variables, such as does windspeed effect rentals or workday vs weekend vs holiday. These can give further insight into what is the main factors for renting a bike.